Take Home Exercise-4

Reveal the daily routines of two selected participant of the city of Engagement, Ohio USA.

Shachi Anirudha Raodeo , true
2022-05-25

The Task

This take-home exercise aims to reveal the daily life of the participants of the city of Engagement, Ohio USA by using visualization techniques in R.

This challenge considers the patterns of daily life throughout the city. To describe the daily routines for some representative people, characterize the travel patterns to identify potential bottlenecks or hazards, and examine how these patterns change over time and seasons.

Assume the participants have given permission to have their daily routines captured. Choose two different participants with different routines and describe their daily patterns, with supporting evidence. Limit your response to 10 images and 500 words.

We consider participants with ID 4 and 486 for the study who have stark differences in their age, joviality, education and household.

Links to the dataset:

Buildings.csv TravelJournal.csv ParticipantStatusLogs10.csv

Step-by-step Data Visualisation

Installing and launching R packages

Packages, namely tidyverse and ggplot2and ViSiElse are required for this exercise. This code chunk installs the required packages and loads them onto RStudio environment.

packages = c('tidyverse','ggplot2','ggdist', 'ggridges','patchwork', 'ggthemes','hrbrthemes','ggrepel','ggforce',"HH","vcd",'scales','grid','gridExtra','formattable','readr', 'ggiraph', 'plotly', 'DT', 'gganimate','readxl','gifski','gapminder','treemap','treemapify','rPackedBar','ggstatsplot','ggside','broom','crosstalk','ViSiElse','zoo', 'lubridate', 'remotes', 'trelliscopejs','data.table','sf','tmap','sf','clock','sftime','rmarkdown')

for (p in packages){
  if(!require(p, character.only = T)){
    install.packages(p)
  }
  library(p, character.only = T)
}

Data Preparation

Data Source

The dataset used in this exercise is Participants.csv, published by the IEEE for [VAST challenge 2022] (https://vast-challenge.github.io/2022/)

Importing the dataset

The code chunk below imports Buildings.csv , TravelJournal.csv, ParticipantStatusLogs10.csv from the data folder into R by using read_csv() function of readr and saves it as Tibble data frame called buildings, travel, logs

The TravelJournal.csv contains information about participants’ motivation for movement around the city.

travel <- read_csv("data/TravelJournal.csv")
summary(travel)
buildings <- read_sf("data/Buildings.csv", 
                   options = "GEOM_POSSIBLE_NAMES=location")
logs <- read_csv("data/ParticipantStatusLogs10.csv")

Data Wrangling

The Travel Journal contains financial transactions by a participant towards Work/Home Commute, Eating, Coming Back From Restaurant,Recreation (Social Gathering), Going Back to Home. The checkin journal contains checkin details into a Restaurant, Pub, Apartment or Workplace

We filter out the records related participant 4 and 486 We choose participant 4 for analysis as participant 4 - has kids in his household, has a household size of 3, is aged 43, has completed his/her bachelors and has a high joviality of 0.85

As compared to participant 4 participant 486 - has no lids in household, has a household size of 1, is aged 29, has low education level and a low joviality of 0.02

We aim to find the lifestyle patterns and daily routine of both these participants.

travel_filt <- travel[travel$participantId %in% c("4","486"),]
logs <- logs[logs$participantId %in% c("4","486"),]
logs_plot <- logs_plot[logs_plot$participantId %in% c("4","486"),]

Calculating Amount Spent

Calculating the total amount spent at the location as a difference of the starting balance and ending balance in the travel journal gives us the amount spent by the participant at a particular location.

travel_filt$amountSpent <- travel_filt$endingBalance -travel_filt$startingBalance

Extract information from timestamp

We use weekdays(), day(), month(), year() functions to extract the day of the week, date, moth and year of checkin to perform time series visualizations.

Calculate the time spent at a particular place and the travel time

Calculate travel time as the difference between the travel start time and the travel end time and calculate the time spent as the difference of check in and check out times.

data_travel= travel_filt %>%
  mutate(weekday = weekdays(checkInTime),
         day = day(checkInTime),
         month=as.character(checkInTime,"%b %y"),
         year = year(checkInTime),
         monthYear = floor_date(checkInTime, "month"),
         travelEndLocationId=as.character(travelEndLocationId),
         timeSpent = checkOutTime - checkInTime,
         travelTime = travelEndTime- travelStartTime,
         participantId=as.character(participantId),
         purpose=as.character(purpose))

data_travel$timeSpent <- as.numeric(as.character(data_travel$timeSpent))
data_travel$travelTime <- as.numeric(as.character(data_travel$travelTime))

Filter necessary columns

data_travel <- data_travel[,c("participantId","travelStartLocationId", "travelEndLocationId", "purpose", "checkInTime", "amountSpent","timeSpent","travelTime","weekday","day","month","year","monthYear")]

Save files as RDS

saveRDS ( data_travel, 'data/data_travel.rds')
saveRDS ( logs, 'data/logs.rds')
saveRDS ( logs_plot, 'data/logs_plot.rds')
data_travel <- readRDS ( 'data/data_travel.rds')
head (data_travel)
# A tibble: 6 × 13
  participantId travelStartLocationId travelEndLocationId purpose     
  <chr>                         <dbl> <chr>               <chr>       
1 4                               194 411                 Work/Home C…
2 4                               411 448                 Eating      
3 486                             945 1311                Work/Home C…
4 486                            1311 1345                Eating      
5 4                               448 411                 Coming Back…
6 486                            1345 1311                Coming Back…
# … with 9 more variables: checkInTime <dttm>, amountSpent <dbl>,
#   timeSpent <dbl>, travelTime <dbl>, weekday <chr>, day <int>,
#   month <chr>, year <int>, monthYear <dttm>
logs_plot <- readRDS ( 'data/logs_plot.rds')

logs <- readRDS ( 'data/logs.rds')
head(logs)
# A tibble: 6 × 12
  timestamp           currentLocation        participantId currentMode
  <dttm>              <chr>                          <dbl> <chr>      
1 2022-04-26 00:25:00 POINT (-2619.97535009…           486 AtHome     
2 2022-04-26 00:30:00 POINT (1488.843264587…             4 AtHome     
3 2022-04-26 00:30:00 POINT (-2619.97535009…           486 AtHome     
4 2022-04-26 00:35:00 POINT (1488.843264587…             4 AtHome     
5 2022-04-26 00:35:00 POINT (-2619.97535009…           486 AtHome     
6 2022-04-26 00:40:00 POINT (1488.843264587…             4 AtHome     
# … with 8 more variables: hungerStatus <chr>, sleepStatus <chr>,
#   apartmentId <dbl>, availableBalance <dbl>, jobId <dbl>,
#   financialStatus <chr>, dailyFoodBudget <dbl>,
#   weeklyExtraBudget <dbl>

Data Visualization

Time spent over the days in travelling March 2022

march <- data_travel %>%
  filter(year %in% c('2022'))%>%
  group_by(monthYear,participantId) %>%
  summarise(travelTime = mean(abs(travelTime)))
p<- ggplot(march, aes(x=monthYear, y=travelTime, group=participantId)) +
  geom_line(aes(color=participantId),show.legend = TRUE)+
  labs(
    y= 'Average Time',
    x= 'months -2022',
    title = "Average time spent in Travel - 2022",
    caption = "Ohio USA"
  ) +
  theme_minimal()+
  theme(axis.ticks.x= element_blank(),
        panel.background= element_blank(), 
        legend.background = element_blank(),
        plot.title = element_text(size=12, face="bold",hjust = 0.5),
        plot.subtitle = element_text(hjust = 1),
        plot.caption = element_text(hjust = 0),
        axis.title.y= element_text(angle=0))

ggplotly(p)

Plot of travel time v/s time spent at the venue

plot_ly(data = data_travel,
        x = ~travelTime,
        y = ~timeSpent,
        color = ~weekday) %>%
  layout(title = 'Travel time v/s time spent')

Animated plot of Plot of travel time v/s time spent at the venue

ggplot(data_travel, aes(x = travelTime, 
                    y = timeSpent,
                    size = participantId,
                    colour = weekday)) +
  geom_point(alpha = 0.5, show.legend = TRUE) +
  labs(title = 'Time spent at a place v/s travel time', 
       x = 'Travel time', 
       y = 'Time Spent') +
  theme_minimal()+
  transition_time(as.integer(monthYear)) +
  ease_aes('linear')

Ridge plot of time spent over the different months for both participants

participant_data = data_travel %>% group_by(weekday,month, purpose, participantId) %>%
  summarise(timeSpent = mean(timeSpent))
ggplot(data= participant_data, 
       aes(x = timeSpent, y= month, group = month, fill = month)) +
  geom_density_ridges(geom = "density_ridges_gradient", 
                      calc_ecdf = TRUE,
                      quantiles = 4, 
                      quantile_lines = TRUE,
                      alpha = .3) +
  theme_ridges() + 
  scale_fill_viridis_d(name = "Quartiles")+
  theme_bw()+
      labs(
    y= 'month',
    x= 'Time Spent',
    title = "Time Spent at Venue",
    caption = "Ohio USA"
  )+
  theme(
    axis.title.y= element_text(angle=0), axis.ticks.x= element_blank(),
    plot.title = element_text(hjust = 0.5),
    plot.subtitle = element_text(hjust = 1),
    plot.caption = element_text(hjust = 0),
    axis.text.x = element_text(vjust = 0.5),
    legend.position = "none"
  )+
facet_wrap(~participantId)

Trelliscope of time spent over the months based on purpose

ggplot(data_travel,
       aes(y = timeSpent, x = factor(month,
      levels=c("Mar 22","Apr 22","May 22","Jun 22","Jul 22","Aug 22","Sep 22","Oct 22","Nov 22","Dec 22","Jan 23","Feb 23","Mar 23","Apr 23","May 23")), fill= purpose))+
  geom_col(aes(group=1))+
  xlab("Month")+
  ylab("Time Spent")+
  facet_trelliscope(~ weekday,nrow=2,ncol=2,path=".",width=800)+
  theme(axis.text.x = element_text(size=6),
        axis.text.y = element_text(size=6))

Plot of amount spent for different Purposes

spending_data = data_travel %>% group_by(weekday, participantId, purpose) %>%
  summarise(amountSpent = sum(amountSpent))
plot_ly(data=spending_data, x = ~purpose, y = ~amountSpent, color = ~weekday)  

Calendar heat map of the Average Travel Time

participant_time = data_travel %>%
  mutate(weekday = weekdays(checkInTime),
         day = day(checkInTime),
         hour = hour(checkInTime),
         participantId=as.character(participantId))%>%
  na.omit()
wkday_levels <- c('Saturday', 'Friday', 
                  'Thursday', 'Wednesday', 
                  'Tuesday', 'Monday', 
                  'Sunday')
participant_time <- participant_time %>%
  group_by(participantId) %>%
  mutate(weekday = factor(
    weekday, levels = wkday_levels),
    hour  = factor(
      hour, levels = 0:23))
participant_time%>%
  ggplot(aes(hour,weekday,fill=travelTime))+
  geom_tile(color = "white", size = 0.05)+
  theme_tufte(base_family = "Helvetica")+
  coord_equal() +
  scale_fill_gradient(name = "Time (min)",
                    low = "sky blue", 
                    high = "dark blue",
                    labels = comma,
                    na.value = 'sky blue')+
  labs(x = "Weeks of the Year",
       y = NULL,
       title = "Average Travel Time participant") +
  theme(axis.text = element_text(size = 7,margin = margin(r = -60)),
        axis.ticks.y= element_blank(),
        legend.title = element_text(size =12),
        legend.text = element_text(size = 12),
        axis.title.x = element_text(size=12),
        panel.background = element_rect(fill = 'sky blue'))+
  facet_col(~participantId)

Travel route of participant 4

logs_path <- logs_plot %>%
  mutate(Timestamp = date_time_parse(timestamp,
                                     zone= "",
                                     format = "%Y-%m-%dT%H:%M:%S"))%>%
  mutate(day=get_day(Timestamp))%>%
  filter(currentMode == "Transport")
logs_path <- logs_path %>%
  group_by(participantId, day) %>%
  summarize(m = mean(Timestamp), 
            do_union=FALSE) %>%
  st_cast("LINESTRING")
logs_path_4 <- logs_path %>%
  filter(participantId==4)
tmap_mode("view")
tm_shape(buildings)+
tm_polygons(col = "grey60",
           size = 1,
           border.col = "black",
           border.lwd = 1)+
tm_shape(logs_path)+
tm_lines(col = "blue")+
tmap_mode("plot")

logs_path_486 <- logs_path %>%
  filter(participantId==486)
tmap_mode("view")
tm_shape(buildings)+
tm_polygons(col = "grey60",
           size = 1,
           border.col = "black",
           border.lwd = 1)+
tm_shape(logs_path)+
tm_lines(col = "blue")+
tmap_mode("plot")

logs_4 <- logs[logs$participantId %in% c("4"),]
logs_4<- logs_4 %>%
  mutate(time= format(as.POSIXct(timestamp),   # Extract hours, minutes & seconds
                 format = "%H:%M:%S"),)
logs_4<- logs_4[,c("timestamp","time", "currentMode", "hungerStatus", "sleepStatus")]
logst_4 <- transpose(logs_4)